Bike sharing systems have emerged as a popular, eco-friendly transportation alternative in cities worldwide. These systems provide a network of bicycles and docking stations, allowing users to pick up a bike from one location and drop it off at another, facilitating short, point-to-point trips. However, one significant challenge in these systems is maintaining a balanced distribution of bikes across the network. This is where bike re-balancing becomes crucial.
Due to daily commuting patterns and other socio-economic factors, certain areas in a city may experience a surplus of bikes while others face a shortage. For instance, residential areas might see a high number of bikes in the morning as people commute to work, leaving these stations depleted by evening. Conversely, business districts may have an excess of bikes during the day, which need to be redistributed for the evening commute home. Without effective re-balancing, the utility and efficiency of the bike share system are significantly reduced, leading to customer dissatisfaction and a decline in usage.
Below, we propose and evaluate a rebalancing strategy for Indego, the bike share system in Philadelphia, based on a spatiotemporal predictive model. Launched in 2015, Indego has 1,400 bikes distributed across 130 stations, mostly concentrated in Center City, West Philadelphia, and South Philadelphia.1 Based on data from the 4th quarter of 2022, as well as supplemental weather data, we propose a predictive model optimized for overnight rebalancing with trucks, which aims to minimize expected customer dissatisfaction the next day.2 Based on our predictions, an integer program could be used to optimize redistribution routes.3
Show the code
ggplot() +geom_sf(data = phlTracts) +geom_sf(data = stations %>%group_by(start_station) %>%summarize(count =n())) +labs(title ="Distribution of Indego Stations in Philadelphia",subtitle ="December 2022")+ mapTheme
3 Methods
3.1 Data
3.1.1 Overview
For our analysis, we rely on three datasets: 1) Indego station and ridership data from Q4 of 2022, 2) weather data collected at Philadelphia International Airport during those same dates and accessed via the reim package in R, and 3) American Community Survey data from 2021.
3.1.2 Exploratory Analysis
Examining our bike share data, we notice clear temporal trends: peaks during certain hours of the day and a longer-time decline in trips across the quarter.
Show the code
ggplot(dat_census %>%group_by(interval60, start_station) %>%tally())+geom_histogram(aes(n), binwidth =5)+labs(title="Bike share trips per hr by station. Philadelphia, Oct. - Dec., 2022",x="Trip Counts", y="Number of Stations")+ plotTheme
The afternoon rush hour, for example, is associated with more stations with a higher number of trips.
Likewise, total bike counts across the city per hour peak during the morning and afternoon rush hours. We also observe that total ridership is meaningfully lower on weekends than during the week.
Show the code
dat_census %>%mutate(hour =hour(interval60)) %>%ggplot(aes(x = hour, color =as.factor(dotw)))+geom_freqpoly(binwidth =1)+labs(title="Bike share trips in Philadelphia, by day of the week, Oct. - Dec., 2022",x="Hour", y="Trip Counts")+ plotTheme
ggplot(dat_census %>%group_by(interval60) %>%tally())+geom_line(aes(x = interval60, y = n))+labs(title="Bike share trips per hr. Philadelphia, Oct. - Dec., 2022",x="Date", y="Number of trips")+ plotTheme
Show the code
dat_census %>%mutate(time_of_day =case_when(hour(interval60) <7|hour(interval60) >18~"Overnight",hour(interval60) >=7&hour(interval60) <10~"AM Rush",hour(interval60) >=10&hour(interval60) <15~"Mid-Day",hour(interval60) >=15&hour(interval60) <=18~"PM Rush"))%>%group_by(interval60, start_station, time_of_day) %>%tally()%>%group_by(start_station, time_of_day)%>%summarize(mean_trips =mean(n))%>%ggplot()+geom_density(aes(mean_trips, color = time_of_day), fill =NA, alpha =0.3)+labs(title="Mean Number of Hourly Trips Per Station. Philadelphia, Oct. - Dec., 2022",x="Number of trips", y="Density",color ="Time of Day")+# facet_wrap(~time_of_day)+ plotTheme
These trends are born out spatially, as we observe that the highest ridership per station clusters in the downtown and University City areas during the afternoon rush hour.
Use purrr to train and validate several models for comparison on the latter two week test set. Perform either random k-fold cross validation or LOGO-CV on the 5 week panel. You may choose to cross validate by time or space. Interpret your findings in the context of accuracy and generalizability.